DF with length as a parameter - NISHIO Hirokazu's Scrapbox (Auto-translated from Japanese)

DF with length as a parameter

When the DF of a word is 0.1

It's like saying that the probability of appearance is 1 in 10 documents.

The probability of appearance p is naturally higher the longer the document is.

I think it is strange to think that p is a function of the word w only.

Wouldn't it be better to build a model that estimates p from document length n and word w?

---

This page is auto-translated from /nishio/長さをパラメータにしたDF using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.